21 research outputs found
The Pulse of News in Social Media: Forecasting Popularity
News articles are extremely time sensitive by nature. There is also intense
competition among news items to propagate as widely as possible. Hence, the
task of predicting the popularity of news items on the social web is both
interesting and challenging. Prior research has dealt with predicting eventual
online popularity based on early popularity. It is most desirable, however, to
predict the popularity of items prior to their release, fostering the
possibility of appropriate decision making to modify an article and the manner
of its publication. In this paper, we construct a multi-dimensional feature
space derived from properties of an article and evaluate the efficacy of these
features to serve as predictors of online popularity. We examine both
regression and classification algorithms and demonstrate that despite
randomness in human behavior, it is possible to predict ranges of popularity on
twitter with an overall 84% accuracy. Our study also serves to illustrate the
differences between traditionally prominent sources and those immensely popular
on the social web
Blind Men and the Elephant: Detecting Evolving Groups In Social News
We propose an automated and unsupervised methodology for a novel
summarization of group behavior based on content preference. We show that graph
theoretical community evolution (based on similarity of user preference for
content) is effective in indexing these dynamics. Combined with text analysis
that targets automatically-identified representative content for each
community, our method produces a novel multi-layered representation of evolving
group behavior. We demonstrate this methodology in the context of political
discourse on a social news site with data that spans more than four years and
find coexisting political leanings over extended periods and a disruptive
external event that lead to a significant reorganization of existing patterns.
Finally, where there exists no ground truth, we propose a new evaluation
approach by using entropy measures as evidence of coherence along the evolution
path of these groups. This methodology is valuable to designers and managers of
online forums in need of granular analytics of user activity, as well as to
researchers in social and political sciences who wish to extend their inquiries
to large-scale data available on the web.Comment: 10 pages, icwsm201
Predicting Rising Follower Counts on Twitter Using Profile Information
When evaluating the cause of one's popularity on Twitter, one thing is
considered to be the main driver: Many tweets. There is debate about the kind
of tweet one should publish, but little beyond tweets. Of particular interest
is the information provided by each Twitter user's profile page. One of the
features are the given names on those profiles. Studies on psychology and
economics identified correlations of the first name to, e.g., one's school
marks or chances of getting a job interview in the US. Therefore, we are
interested in the influence of those profile information on the follower count.
We addressed this question by analyzing the profiles of about 6 Million Twitter
users. All profiles are separated into three groups: Users that have a first
name, English words, or neither of both in their name field. The assumption is
that names and words influence the discoverability of a user and subsequently
his/her follower count. We propose a classifier that labels users who will
increase their follower count within a month by applying different models based
on the user's group. The classifiers are evaluated with the area under the
receiver operator curve score and achieves a score above 0.800.Comment: 10 pages, 3 figures, 8 tables, WebSci '17, June 25--28, 2017, Troy,
NY, US
An Automated Pipeline for Character and Relationship Extraction from Readers' Literary Book Reviews on Goodreads.com
Reader reviews of literary fiction on social media, especially those in
persistent, dedicated forums, create and are in turn driven by underlying
narrative frameworks. In their comments about a novel, readers generally
include only a subset of characters and their relationships, thus offering a
limited perspective on that work. Yet in aggregate, these reviews capture an
underlying narrative framework comprised of different actants (people, places,
things), their roles, and interactions that we label the "consensus narrative
framework". We represent this framework in the form of an actant-relationship
story graph. Extracting this graph is a challenging computational problem,
which we pose as a latent graphical model estimation problem. Posts and reviews
are viewed as samples of sub graphs/networks of the hidden narrative framework.
Inspired by the qualitative narrative theory of Greimas, we formulate a
graphical generative Machine Learning (ML) model where nodes represent actants,
and multi-edges and self-loops among nodes capture context-specific
relationships. We develop a pipeline of interlocking automated methods to
extract key actants and their relationships, and apply it to thousands of
reviews and comments posted on Goodreads.com. We manually derive the ground
truth narrative framework from SparkNotes, and then use word embedding tools to
compare relationships in ground truth networks with our extracted networks. We
find that our automated methodology generates highly accurate consensus
narrative frameworks: for our four target novels, with approximately 2900
reviews per novel, we report average coverage/recall of important relationships
of > 80% and an average edge detection rate of >89\%. These extracted narrative
frameworks can generate insight into how people (or classes of people) read and
how they recount what they have read to others
Gestalt Computing and the Study of Content-oriented User Behavior on the Web
Elementary actions online establish an individual's existence on the web and her/his orientation toward different issues. In this sense, actions truly define a user in spaces like online forums and communities and the aggregate of elementary actions shape the atmosphere of these online spaces. This observation, coupled with the unprecedented scale and detail of data on user actions on the web, compels us to utilize them in understanding collective human behavior. Despite large investments by industry to capture this data and the expanding body of research on big data<\italic> in academia, gaining insight into collective user behavior online has been elusive. If one is indeed able to overcome the considerable computational challenges posed by both the scale and the inevitable noisiness of the associated data sets, one could provide new automated frameworks to extract insights into evolving behavior at different scales, and to form an altogether different perspective of aggregated elementary user actions. This thesis addresses this fundamental and pressing problem and offers a gestalt computing<\italic> approach when studying complex social phenomena in large datasets. This approach involves extracting macro structures from aggregated user actions, finding their possible meanings, and arranging data in layers so that it is iteratively explorable. The dissertation includes three major sections; first modeling and prediction of diffusion of information by users on the social web; next, detection of topics promoted by user communities; finally, presentation of the gestalt computing framework through a methodology that uses graph theory, language processing, and information theory to provide a top-down map of group dynamics on social news websites. What we find is not only statistical significance in the extracted structure, but also that the results are meaningful to human understanding. The efficacy of the proposed methodologies is established via multiple real-world data sets
Recommended from our members
Communication vs. Performance in Source Localization
Acoustic source localization often requires the transmission of full received waveforms to a fusion center. Using these waveforms the location of a source can be estimated by different methods such as Beamforming, MUSIC, or AML. In either of these cases, a large number or bits is communicated to the fusion center. When communication has to be done in a wireless manner, a considerable amount of energy is expended and where power is not readily available, this can result in shortening the lifetime of the system. We are interested in investigating how much accuracy is lost by reducing the number of bits transmitted by each sensor. This poster demostrates a study of the tradeoffs between localization performance and number of bits transmitted. A few cases were simulated where sensors have a capability of measuring signal power and can transmit only one bit in one case and two bits in another case
Recommended from our members
Communication vs. Performance in Source Localization
Acoustic source localization often requires the transmission of full received waveforms to a fusion center. Using these waveforms the location of a source can be estimated by different methods such as Beamforming, MUSIC, or AML. In either of these cases, a large number or bits is communicated to the fusion center. When communication has to be done in a wireless manner, a considerable amount of energy is expended and where power is not readily available, this can result in shortening the lifetime of the system. We are interested in investigating how much accuracy is lost by reducing the number of bits transmitted by each sensor. This poster demostrates a study of the tradeoffs between localization performance and number of bits transmitted. A few cases were simulated where sensors have a capability of measuring signal power and can transmit only one bit in one case and two bits in another case